Distribution-balanced stratified cross-validation for accuracy estimation

نویسندگان

  • Xinchuan Zeng
  • Tony R. Martinez
چکیده

Cross-validation has often been applied in machine learning research for estimating the accuracies of clas-siiers. In this work, we propose an extension to this method, called distribution-balanced stratiied cross-validation (DBSCV), which improves the estimation quality by providing balanced intraclass distributions when partitioning a data set into multiple folds. We have tested DBSC V on nine real-world and three artiicial domains using the C 4:5 decision trees classiier. The results show that DBSC V performs better (has smaller biases) than the regular stratiied cross-validation in most cases, especially when the number of folds is small. The analysis and experiments based on three artiicial data sets also reveal that DBSC V is particularly eeective when multiple intraclass clusters existi n a data set.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bayes Interval Estimation on the Parameters of the Weibull Distribution for Complete and Censored Tests

A method for constructing confidence intervals on parameters of a continuous probability distribution is developed in this paper. The objective is to present a model for an uncertainty represented by parameters of a probability density function.  As an application, confidence intervals for the two parameters of the Weibull distribution along with their joint confidence interval are derived. The...

متن کامل

Impacts of Sample Design for Validation Data on the Accuracy of Feedforward Neural Network Classification

Validation data are often used to evaluate the performance of a trained neural network and used in the selection of a network deemed optimal for the task at-hand. Optimality is commonly assessed with a measure, such as overall classification accuracy. The latter is often calculated directly from a confusion matrix showing the counts of cases in the validation set with particular labelling prope...

متن کامل

Performance Optimization of Data Mining Application Using Radial Basis Function Classifier

Text data mining is a process of exploratory data analysis. Classification maps data into predefined groups or classes. It is often referred to as supervised learning because the classes are determined before examining the data. This paper describes proposed radial basis function Classifier that performs comparative crossvalidation for existing radial basis function Classifier. The feasibility ...

متن کامل

The Effect of Station Density and Regional Division on Spatial Distribution of Daily Rainfall

Rainfall is one of the most important climatic variables in the hydrology cycle. In flood estimation as well as environmental pollution studies in medium to large watersheds not only mus temporal pattern of rainfall t be known, but also the knowledge of its spatial distribution is required. Estimation of daily rainfall distribution without comparison and selection of &#10suitable methods may le...

متن کامل

The Effect of Station Density and Regional Division on Spatial Distribution of Daily Rainfall

Rainfall is one of the most important climatic variables in the hydrology cycle. In flood estimation as well as environmental pollution studies in medium to large watersheds not only mus temporal pattern of rainfall t be known, but also the knowledge of its spatial distribution is required. Estimation of daily rainfall distribution without comparison and selection of suitable methods may lead...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • J. Exp. Theor. Artif. Intell.

دوره 12  شماره 

صفحات  -

تاریخ انتشار 2000